This document serves as a mini-tutorial on using tmap library for creating thematic maps in R. We will be visualizing COVID-19 Cases in Toronto Neighbourhoods.
Disclaimer!!
This data is based solely on COVID-19 cases and does not consider, population, population density, multi-generational homes, long-term care, socioeconomics, access to care, unhoused populations or other factors that would impact the number of cases in a given neighbourhood.
Please note, this tutorial serves as a basis for quickly illustrating how tmap works for a Guest Lecture at University of Toronto. It is not an exhaustive resource for interpreting public health data, as COVID-19-19 is still ongoing the data presented in this tutorial is taken as a snapshot in time from the beginning of October 2020. This data is subject to change and may differ from official data sources, as they remove duplicates and resolve data issues. There is only so much that can be interpreted from maps or visualizations alone and subject matter experts (epidemiologists, public health researchers, virologists, etc.) should always be consulted when examining public health data - especially in this climate.
The metadata for COVID-19 cases by Toronto neighbourhood is available here:
https://open.toronto.ca/dataset/covid-19-cases-in-toronto/
Toronto Neighbourhoods were also downloaded from the Toronto Open data portal:
https://open.toronto.ca/dataset/neighbourhoods/
Please review the metadata page for explanations of column names and important characteristics of the data.
toronto <- st_read("./outputs/toronto_neighbourhoods.gpkg")
## Reading layer `toronto_neighbourhoods' from data source `C:\Users\Lauren\Desktop\Projects\mapping-visualizations\outputs\toronto_neighbourhoods.gpkg' using driver `GPKG'
## Simple feature collection with 140 features and 2 fields
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: -79.63926 ymin: 43.581 xmax: -79.11545 ymax: 43.85546
## geographic CRS: WGS 84
neighbourhood_covid<-st_read("./outputs/toronto_neighbourhoods_covid.gpkg")
## Reading layer `toronto_neighbourhoods_covid' from data source `C:\Users\Lauren\Desktop\Projects\mapping-visualizations\outputs\toronto_neighbourhoods_covid.gpkg' using driver `GPKG'
## Simple feature collection with 140 features and 3 fields
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: -79.63926 ymin: 43.581 xmax: -79.11545 ymax: 43.85546
## geographic CRS: WGS 84
covid<- read_csv('./Data/COVID19 cases.csv')
## Parsed with column specification:
## cols(
## `_id` = col_double(),
## Assigned_ID = col_double(),
## `Outbreak Associated` = col_character(),
## `Age Group` = col_character(),
## `Neighbourhood Name` = col_character(),
## FSA = col_character(),
## `Source of Infection` = col_character(),
## Classification = col_character(),
## `Episode Date` = col_character(),
## `Reported Date` = col_character(),
## `Client Gender` = col_character(),
## Outcome = col_character(),
## `Currently Hospitalized` = col_character(),
## `Currently in ICU` = col_character(),
## `Currently Intubated` = col_character(),
## `Ever Hospitalized` = col_character(),
## `Ever in ICU` = col_character(),
## `Ever Intubated` = col_character()
## )
covid<-tibble(covid, .name_repair = make.names)
We can calculate the total number of COVID-19 cases per neighbourhood:
covid_counts <- covid %>%
group_by(Neighbourhood.Name)%>%
count()
covid %>%filter(is.na(Neighbourhood.Name))
## # A tibble: 721 x 18
## X_id Assigned_ID Outbreak.Associ~ Age.Group Neighbourhood.N~ FSA
## <dbl> <dbl> <chr> <chr> <chr> <chr>
## 1 219413 19 Outbreak Associ~ 40 to 49~ <NA> <NA>
## 2 219422 28 Sporadic 20 to 29~ <NA> <NA>
## 3 219503 109 Sporadic 50 to 59~ <NA> M3K
## 4 219523 129 Sporadic 20 to 29~ <NA> M8V
## 5 219611 217 Sporadic 30 to 39~ <NA> <NA>
## 6 219651 257 Sporadic 20 to 29~ <NA> M4M
## 7 219663 269 Sporadic 50 to 59~ <NA> M8W
## 8 219725 331 Sporadic 20 to 29~ <NA> M5B
## 9 219784 393 Sporadic <NA> <NA> <NA>
## 10 219787 396 Sporadic 50 to 59~ <NA> <NA>
## # ... with 711 more rows, and 12 more variables: Source.of.Infection <chr>,
## # Classification <chr>, Episode.Date <chr>, Reported.Date <chr>,
## # Client.Gender <chr>, Outcome <chr>, Currently.Hospitalized <chr>,
## # Currently.in.ICU <chr>, Currently.Intubated <chr>, Ever.Hospitalized <chr>,
## # Ever.in.ICU <chr>, Ever.Intubated <chr>
There is missing neighbourhoods in some of the COVID-19 data (n=721). From Toronto’s open data website for the COVID-19-19 cases, the metadata indicates that neighbourhood name is created by:
Client postal code information is mapped to the most up-to-date census tract (CT) and neighbourhood information available from the city. As a result, neighbourhood information is not available for those with missing postal code or when postal code could not be mapped/linked to a CT.
Therefore, the missing data may be because of a missing postal code or for people who were tested in Toronto but live outside the city.
Let’s remove them for now
covid_counts<-covid_counts %>%filter(!is.na("Neighbourhood.Name"))
Now that our counts are joined to the spatial neighbourhoods of Toronto, we can begin to map them with Tmap.
Like ggplot2, tmap is based on the idea of a ‘grammar of graphics’ (Wilkinson and Wills 2005). This involves a separation between the input data and the aesthetics (how data are visualised): each input dataset can be ‘mapped’ in a range of different ways including location on the map (defined by data’s geometry), color, and other visual variables. The basic building block is tm_shape() (which defines input data, raster and vector objects), followed by one or more layer elements such as tm_fill() and tm_borders()
We want to map the shape of the neighbourhoods neighbourhood_covid and show the differences in the total number of cases n by using tm_fill. We can add a white border with a line width lwd = 0.5 of 0.5.
tm_shape(neighbourhood_covid) +
tm_fill("n")+
tm_borders("white", lwd=0.5)
We can change the colour palette:
tm_shape(neighbourhood_covid) +
tm_fill("n",palette = brewer.pal("Greens", n = 6))+
tm_borders("black", lwd=0.5)
tm_shape(neighbourhood_covid) +
tm_fill("n",palette = brewer.pal("Greens", n = 6),
legend.hist = T)+
tm_borders("black", lwd=0.5)+
tm_legend(outside = TRUE, hist.width=2)+
tm_layout(legend.hist.size = 0.5)
Use tm_dots for proportional symbols
tm_shape(neighbourhood_covid)+
tm_dots(size="n")+
tm_borders("black", lwd=0.5)
scale bar : tm_scale_bar
compass : tm_compass
main title main.title
neighbourhood_covid<-st_transform(neighbourhood_covid,2958)
tm_shape(neighbourhood_covid) +
tm_fill("n",palette = brewer.pal("Greens", n = 6))+
tm_borders("black", lwd=0.5)+
tm_scale_bar(width = 0.22) +
tm_compass(position = c(0.85,0.4)) +
tm_layout(main.title = "Total Positive Covid Cases by Neighbourhood", bg.color = "lightgrey", inner.margins = c(0, 0, 0, 0),
frame.lwd = 5, title.size = 1)
If we wanted to show the neighbourhoods that have more than 500 cases:
First we can subset the data :
over_500<-neighbourhood_covid %>%
filter(n>500)
First we want to show ALL the neighbourhoods like we have previously using neighbourhood_covid but this time we’re going to set the fill for these to white and the border to grey, so that they take a backseat the neighbourhood we want to highlight.
Then we can take our new subset over_500 and display this in blue on top of neighbourhood_covid. This is where the layered grammar allows us to build up our maps.
To label neighbourhood names tm_text
tm_shape(neighbourhood_covid) +
tm_fill("white")+
tm_borders("grey", lwd=0.5)+
tm_shape(over_500)+
tm_fill("lightblue")+
tm_borders("black")+
tm_text("Neighbourhood.Name", just="left", xmod=0.5, size=0.8, bg.color = "white", remove.overlap = TRUE)+
tm_layout(main.title = "Top 5 Neighbourhoods for Covid Cases", inner.margins = c(0, 0, 0, 0))
Using library rosm.
osm.raster will download tiles directly to your computer (may take some time) based on the square boundary or bounding box around Toronto. Type ?osm.raster into the console to see the options available. Type changes the type of basemap to download while crop, crops the basemap to the bbox extent.
tm_rgb will display the basemap raster.
library(rosm)
bg <- osm.raster(st_bbox(toronto),crop=TRUE)
tm_shape(bg)+
tm_rgb()+
tm_shape(toronto)+
tm_borders()
bg_hotstyle<-osm.raster(st_bbox(toronto), type="hotstyle")
tm_shape(bg_hotstyle)+
tm_rgb()+
tm_shape(toronto)+
tm_borders(col = "black")+
tm_style("classic")